Existing person re-identification benchmarks and methods mainly focus onmatching cropped pedestrian images between queries and candidates. However, itis different from real-world scenarios where the annotations of pedestrianbounding boxes are unavailable and the target person needs to be searched froma gallery of whole scene images. To close the gap, we propose a new deeplearning framework for person search. Instead of breaking it down into twoseparate tasks---pedestrian detection and person re-identification, we jointlyhandle both aspects in a single convolutional neural network. An OnlineInstance Matching (OIM) loss function is proposed to train the networkeffectively, which is scalable to datasets with numerous identities. Tovalidate our approach, we collect and annotate a large-scale benchmark datasetfor person search. It contains 18,184 images, 8,432 identities, and 96,143pedestrian bounding boxes. Experiments show that our framework outperformsother separate approaches, and the proposed OIM loss function converges muchfaster and better than the conventional Softmax loss.
展开▼